We would like to conduct a linear regression on delay time in minutes. In this case, we would need to check if the interactions between some continuous predictors and airlines, months, periods of time, respectively, were significantly associated with the total delay time in minutes. To achieve it, we did some visualizations. For tidiness of visualizations, we have adjusted the range of axis.


The continuous predictors included:


cont_airline = function(cont){
  
  airline = raw_df %>% 
    mutate(
      text_label = str_c("Airline: ", airline)
    ) %>% 
    plot_ly(x = ~cont, y = ~delay, color = ~airline,
            text = ~text_label, hoverinfo = "text",
            type = "scatter", mode = "markers", alpha = .5)
}

cont_month = function(cont){
  month = raw_df %>%
    mutate(
      text_label = str_c("Month: ", month),
      month = fct_reorder(month, date)) %>% 
    plot_ly(x = ~cont, y = ~delay, color = ~month,
          text = ~text_label, hoverinfo = "text",
          type = "scatter", mode = "markers", alpha = .5)
}
  
cont_hour = function(cont){
  hour = raw_df %>% 
    mutate(
      text_label = str_c("Period: ", hour_c)) %>% 
        plot_ly(x = ~cont, y = ~delay, color = ~hour_c,
          text = ~text_label, hoverinfo = "text",
          type = "scatter", mode = "markers", alpha = .5)
}

Interaction for Continuous Predictors


Types of Delay

Carrier Delay

Extreme Weather Delay

Late Arrival Delay

NAS Delay

Security Delay

Weather Specific

Temperature

Humidity

Visibility

Wind Speed


Based on the graphs, we found that there could be significant interactions between:

  1. Carrier Delay * Airline

  2. Temperature * Month

As a result, we would like to focus on these interaction terms in addition to other predictors.